Information transfer with small-amplitude signals 
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We study the optimality conditions of information transfer in systems with memory in the low 
signal-to-noise ratio regime of vanishing input amplitude. We find that the optimal mutual infor- 
mation is represented by a maximum-variance of the signal time course, with correlation structure 
determined by the Fisher information matrix. We provide illustration of the method on a simple 
biologically-inspired model of electro-sensory neuron. Our general results apply also to the study of 
information transfer in single neurons subject to weak stimulation, with implications to the problem 
of coding efficiency in biological systems. 
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Theoretical approach to the problem of information 
processing in biological (neuronal) systems has received 
significant attention over the past few decades [H, 0], 
with information theory 0-01 providing the fundamen- 
tal framework @-[T2| . Of particular interest are the opti- 
mality conditions under which the information between 
stimuli and responses is maximized Il3|-| l7l|. leading to 
the idea of efficient coding hypothesis [la ]. Due to non- 
linear nature of information-theoretic measures, explic- 
itly formulated optimality conditions are relatively rare 
0, [iji nevertheless numerical methods exploiting 
properties of mutual information are available pi flfl [2l| . 
Recently, the asymptotic relation between mutual infor- 
mation and Fisher information [13, HH has been em- 
ployed for the analysis of optimality conditions in the 
setting of large neuronal populations and large output 
signal-to-noise (SNR) ratios [1, [24[ ■ 

In this paper we examine the effect of vanishing signal 
amplitude on the information transfer. We are motivated 
by the situation observed in sensory neurons, which are 
in many cases known to be responding to weak stim- 
uli intensities (relative to the external or internal noise 
sources) 0, [25|, [26[ . Information transfer in channels sub- 
ject to input cost constraints, with implications to low 
SNR conditions, has also been of interest in information- 
theoretic literature [20]. In this paper we employ a dif- 
ferent setting and examine information transfer in chan- 
nels with memory under vanishing stimulus amplitude 
constraint. We explicitly consider the effect of channel 
memory, since many realistic systems exhibit this prop- 
erty on various time scales, and furthermore the presence 
of memory is known to enhance information transfer in 
many cases d [27r429j. Finally, we apply the theory to 
calculate the effect of memory on information transmis- 
sion in a simple neuronal model [30l l3l| . This system ex- 
hibits the stochastic resonance effect, which is commonly 
understood to be the noise-induced enhancement of the 
system sensitivity to a weak signal [32| (although signal 
weakness is not a necessary condition for the stochastic 
resonance to occur 1331). 



Throughout this paper we assume discrete-time set- 
ting, i.e., we denote the consequent responses of a sin- 
gle stochastic neuronal unit as a vector of random vari- 
ables (discrete or continuous) R = ({-Ri}" = i) T , where 
i indexes the time and (-) T denotes the transposition. 
The response, i?j = r^, is invoked by stimulus, 0^ = 9i, 
where the stimulus course in time is described by a n- 
dimensional vector of random variables (r.v.) 0. We 
account for the memory of the neuron, so that Ri gen- 
erally depends on current, but also on past stimulations 
and responses. In the following we assume that the neu- 
ronal model is realized by the stationary causal discrete- 
time information channel with continuous input, fully de- 
scribed by the conditional probability density function 
f(r\0), which factorizes as 0] 



/(r|0) = n/i(*0i-i,---A 



-n). (l) 



In our setting we do not consider channel feedback, i.e., 
dependence of current stimulus on past responses. 

The two most well-known information measures, 
Fisher information (FI) and Shannon's mutual informa- 
tion (MI), rely on f(r\0). The FI (matrix) is often em- 
ployed as a measure of the efficiency of the population 
coding HP, 

J(0|R) = ([Vln/(r|0)][Vln/(r|0)] T ) r|e , (2) 

where the gradient is with respect to 9, and (-} r \ e denotes 
averaging with respect to f(r\6). Throughout this paper 
we assume that f(r\0) is sufficiently continuous in 0, so 
that the following regulatory conditions L 34] hold 



L 



V/(r|0)dr = O, / VV T /(r|0) dr = 0. (3) 
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FI imposes limits on the precision of 9 estimation from 
the responses, namely, for the variance of any unbiased 
estimator of 9, holds Var(^) > [J-^elR)]^ (34|. 

MI is the fundamental quantity measuring information 
transfer in channels [4] . MI gives the degree of statistical 
dependence between stimuli and responses and is defined 



2 



and after taking the expectation 



7(0; R) 



In 



me) 

p(r) 



(4) 
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where p(r) = (f(r\9)) g describes the marginal distribu- 
tion of responses, and the averaging is with respect to 
the distribution of stimuli, tt(0), so that MI is essen- 
tially property of the joint distribution of stimuli and re- 
sponses. The maximum value of MI per time step, taken 
over all possible stimuli distributions, is the information 
capacity (or capacity rate), C, defined as [3], 



C = lim max —7(0: R). 

n->oo 7r(0) n 



(5) 



FI is a local quantity in the sense that for some 0o, 
J(0o|R) takes into account stimuli from an infinitesimal 
neighbourhood of 8q. In other words, if we assume that 
FI is a real quantity, i.e., something that can be mea- 
sured and taken into account, then the stimuli from the 
neighbourhood of 6q have to be physically present, which 
makes FI analogous to MI in the following sense. Let the 
stimuli be restricted in amplitude, so that for some 6 
and A0 holds G [0 O - Ad, O + Ad] and A6 t > 0. We 
define a shifted r.v. 8® as 8® = — 0o and rewrite the 
MI from Eq. (0| in terms of r.v. 8® ~ n(S0) as 

7(0; R) = Q Mr, O + 88) - yj(r, 8 + 88)} dr 



se 
(6) 

58), (7) 

vse)) se . (8) 

Now we consider the case of vanishing amplitude, 
||A0|| > \\88\\ -4 0, and expand 7(0;R)in Eq. ® 
around 0q in terms of 88. It can be shown 3a] that, 



by further introducing 

<p(r, 0q + 88) = /(r|0 o + 88) In /(r|0 o ^ 
1>(t, O + 88) = /(r|0 o + 88) In (/(r|0 o 



ln(/(r|0 o + <50)) 5e «ln/(r|0 Q 



where (88) = (88) se , and thus the Taylor expansion of 
ip = tp(r,d + S8), is 

i/> « /In / + S8 J In /V/ + (S8) T V/ + 



-88 T ln/VV T /<50 



58 j V ^. * (88) 



f 



vv T / V/V T / 



, { , (so), (io) 

where / = /(r|0 o ) and V/ = V/(r|0)| e= e o . The anal- 
ogous expansion of ip is straightforward. By substitut- 
ing the expansions into Eq. ^ the zeroth- and first- 
order terms cancel and what remains can be written in 
terms of FI matrix evaluated at = 0o, by employing 
J(0 o |R) = [J(0 o |R)] T , as 



j(e 5 R)«i(;,i0 



(<50)] T J(0o|R) [88 -(88)] 



se 



7(0; R)« itr [J(0 o |R)C e ] , (12) 

where C© is the covariance matrix of and tr (■) is the 
matrix trace. Eq. ([12]) holds for a broad class of chan- 
nels with memory, both biologically-inspired and artifi- 
cial, and represents the main result of this paper. 

Next we concentrate on the interpretation and some 
immediate implications of Eq. (|12|) . First, the informa- 
tion capacity from Eq. ^ follows readily from Eq. (|12p : 
FI matrix is the property of the neuronal model, so the 
stimulus properties are represented by C©. Maximizing 
7(0; R) thus corresponds to extremizing the values of 
[Cejin for which the corresponding elements [J(0o|R)]jfe 
are non-zero (with appropriate sign). E.g., for a memo- 
ryless channel, /(r|0) = Yli=i fi( r i\^i), so the FI matrix 
is diagonal with elements [J(0o|R)]b = J(6q\R) (omit- 
ting the index i due to channel stationarity) . The capac- 
ity is thus achieved by maximizing the variance of the 
amplitude-constrained stimulus, so the capacity-bearing 
distribution is realized by two equiprobable probability 
masses located at the interval extremes, and 



C = ±(A8) 2 J(0 Q \R), 



(13) 



a result obtained by different means in [20(. Gener- 
ally, 7(0; R) — > as the stimulus amplitude vanishes. 
It is thus advantageous to introduce the MI (and ca- 
pacity) per maximum stimulus power, i.e., 7(0; R) = 
7(0;R)/||A0||, so for the memoryless channel C — 
J(9o\R) /2, as obtained in [2(| • While thepreviously men- 
tioned asymptotics of MI in terms of FI 0, [24[ deals with 
the low-noise limit of information transmission (i.e., large 
neuronal populations), Eq. (TT2"]) describes the opposite 
"large-noise" limit situation. 

In the following we apply Eq. (fT2")) on the classical 
McCulloch-Pitts (MP) neuronal model, accounting for 
the memory of the noise component. Memoryless vari- 
ant of the MP model has been sucesfully employed in 
describing the stochastic resonance effect in electrosen- 
sory neurons of paddlefish [3(| , and further analyzed in 
detail in [3l|, [3(| . The MP model is based on threshold- 
ing of the stimulus (corrupted by an additive noise X), 
so that the discrete-valued response in time-step i is 



Ri = U(9i+Xi-a), 



(14) 



(11) 



where a is the threshold, U(-) is the Heaviside step func- 
tion and 9i € [-A6 + 6 ,0 + A9] for all i. The oc- 
currence of action potential at time i is indicated by 
Ti!; = 1. In the following we consider the noise r.v. 
X = {Xi,...,X n } T to be identically distributed but 
dependent, which provides the memory effect for the 
MP neuron. For simplicity, we assume in the follow- 
ing that X ~ p(x) is gaussian with covariance matrix 
[Cx],t = cr 2 Qik, where g ik = corr(X l5 X fc ) is the serial 
correlation coefficient. Obviously, since U is not invert- 
ible, any simple form of dependence in the noise (such as 
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first order Markov) is not preserved in the time sequence 
of responses. Generally, the full joint distribution of R is 
required, which means evaluation of n-dimensional gaus- 
sian integrals, which may not be numerically stable. In 
other words, the joint conditional probabilities Pr{R|0} 
are generally not tractable for reasonable values of n. 
The idea is to substitute the full and untractable log- 
likelihood, £(0\r) = In f(r\0), with a computable pseudo- 
log- likelihood [37|, ft p \0\r), neglecting some high-order 
dependencies, i.e., 



^)(0|r) = ^4 p )(0|r), 



(15) 



where £ P> {0\t) are "computable" partitions. Here we 
concentrate on a variant of the second-order pseudo-log- 
likelihood, £^(0\r) = 4(0|r), based on pairwise depen- 
dence [m 



n i—1 

^^lnPr{i^i4|0}- 

i=2 k=l 



2)VlnPr{P l |0}, 



1=1 



(16) 

The advantage of £2 is, that most of the involved integrals 
can be expressed in a semi-closed form for the gaussian 
noise. The problematics of replacing £(0\r) by £2(0^) 
for non-Markov models has been investigated recently in 
statistical literature [H, [39[ . The marginal probability 
Pi of R i = 1 (crossing the threshold) is independent of i 
due to stationarity, and since Ri € {0, 1}, we can write 
Pr{Ri\6i} = nPi + (1 - r 4 )(l - Pi), where 



*4 



1 — erf 



(17) 



by evaluation of the gaussian integral and erf(-) is the 
error function. Similarly, for the bivariate joint response 
probability holds 

Pr{P 4 ,P fc |0 2 ,0 fc } = nnPn + n (l - r k )P 10 

+ (1 - n)r k P m + (1 - ri )(l - r k )P Q0 , (18) 

where P mn = P mn (0i,0k) is the probability of R4 = 
m,R k = n, so J2 mn P mn = 1- Note, that P n + P01 is 
the marginal probability of R k = 1, and Pn + P%o = Pi 
is the marginal probability of Ri = 1. Eq. ([T7| . These 
symmetries and Eq. (| 17[1 give 





x <f>(y- 
1 
2 
1 
2 



1 +erf 



Pn 



Ok ■+ 
-erf 



a)dy, 
a -0 k 
aV2 



1-erf 



Poo = 1 — P11 — P01 — P10 



■a+(a-9k + y)Qik 
~ Wk 

-P11, 
■Pn, 



(19) 
(20) 

(21) 
(22) 



where r/>(-) is the probability density function of a gaus- 
sian r.v. with zero mean and variance equal to a 1 (note 
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FIG. 1. Information capacity (in bits) per vanishing stimu- 
lus power of McCulloch-Pitts neuronal model with memory. 
The noise is a gaussian AR(1) process with first-order serial 
correlation g and variance a 2 . Stimulation parameters are: 
9o = and threshold a = 1. Three situations are shown: 
no memory (se also [3ll. |3^]. corresponds to g = 0), Markov 
(assuming the first-order Markov structure of responses) and 
pseudo-log-likelihood, ^-approximation to the true situation, 
estimated for n = 100. The memory of the neuron enhances 
its information capacity, by reducing the disruptive power of 
the noise. Note, that positive noise correlations increase ca- 
pacity more than negative ones. 



that P m n are functions of 0i,0k,a,a and Qik)- The FI 
matrix will have generally all elements non-zero, and its 
approximation by £2 is 



[J(0|R)] tt -- ^ ~d0~d0k~ 



x Pr{[Px =n, 



,R n = r n }\0}, (23) 



where the sum is over all possible n-dimensional vec- 
tors, consisting of 0's and l's. Due to particular form 
of £2(0\T), however, things are a lot simpler, although 
details of the following calculations will be published else- 
where. For the off-diagonal, i ^ k, and diagonal elements 
evaluated at 0j = k = holds 



J(0|R) ifc = 7 ($ ,a,a, Qik,Pn, 
3(0\R)u = uj (6 ,a,a, Qm,Pii) 



001, 



(24) 
(25) 



where 7(-) and w(-) are complicated (but tabulated) func- 
tions of the indicated parameters, and 



Pn=Pn{6o,0 ), 
0oi = w Pn(0o,0k) 



(26) 
(27) 
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Employing Eq. (|12p gives the covariance matrix of the 
optimal stimulation as 

[C e ]ifc = (A#) 2 sgn ([3(G \T)] ik ) , (28) 

where sgn(-) is the signum function. The capacity rate 
per vanishing stimulus power is then 

C= lim ±-J2\[J(0o\T)] ik \. (29) 

i,k 

Fig. Q] shows how the memory of the neuron enhances 
its information capacity (shown as a capacity per vanish- 
ing stimulus power). We assumed that the noise r.v. X 
is modelled by the AR(1) gaussian discrete-time process 
with first-order correlation g, so that [Cx]jfc = a 2 g' l ~ k '. 
The enhancement is compared to the already investi- 
gated q = case (no memory) [3l|, HH), which exhibits 
the effect of stochastic resonance as the variance of the 
noise increases. The information transferred increases 
with memory, since the noise correlations effectively re- 
duce its "corrupting" power (once the stimulus statistics 
is properly matched to the noise structure, as shown by 
Eq. p2[l ). The no memory values are identical in all 
cases, since the noise correlations are ignored. Besides 
the ^-approximation, the first-order Markov approxima- 
tion is also shown, obtained by setting n = 2 in Eq. (fH))) . 
For Markov approximation the information capacity is 
lower, since the neuron employs only current and immedi- 
ately proceeding response value in the decoding, neglect- 
ing the possibilities of the essentially infinite-memory of 
the MP neuron. Additional numerical calculations show, 
that even small noise correlations (g « 0.2) increase the 



capacity rates of the MP neuron by approx. 15 % (not 
shown in Fig. [T]). 

Our results lead us to comment on the optimality of 
information transfer in real neurons. While the effi- 
cient coding hypothesis relies on the maximum informa- 
tion transfer, one should keep in mind, that from the 
information-theoretic perspective the coding-decoding 
operations are an integral part of the information trans- 
mission process. First, it is well known [J], that for some 
channels the optimal decoding process can be a very com- 
plex task - i.e., employing all the responses obtained so 
far, as illustrated in this paper on a relatively simple ex- 
ample of the MP neuron with memory. Since the nervous 
system is assumed to respond to spike trains in real time 
[29j, it is questionable that real neurons try to achieve 
the true capacity and additional costs must be taken into 
account [4C| |. Second, the discrete, or impulse- like, char- 
acter of capacity-bearing stimulation is not limited only 
to vanishing stimulus amplitudes. This phenomenon oc- 
curs in most channels examined in literature so far (with 
power-constrained AWGN channel, and low-noise limit 
channels being the only known exceptions) (4lj . Another 
possible problem connected with the usage of a continu- 
ously varying stimulus is, that the complete specification 
of particular 6 requires infinite amount of information, 
while real neurons probably do not strive for precise spec- 
ification of 8. 
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